Dumfries and Galloway
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > Scotland > Scottish Borders (0.04)
- Europe > United Kingdom > Scotland > Dumfries and Galloway (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Overview (0.68)
UK lacks plan to defend itself from invasion, MPs warn
The UK lacks a plan to defend itself from military attack, a committee of MPs has warned. In a highly critical report, the defence committee says the UK is over-reliant on US resources and that preparations to defend itself and overseas territories in the event of attack are nowhere near where they need to be. The committee's chair, Labour MP Tan Dhesi, said: Putin's brutal invasion of Ukraine, unrelenting disinformation campaigns, and repeated incursions into European airspace mean that we cannot afford to bury our heads in the sand. It comes as the Ministry of Defence (MoD) identified parts of the country where six or more new munitions factories could be built. In June, Defence Secretary John Healey announced plans to move the UK to war-fighting readiness, including £1.5bn to support the construction of new munitions factories, which will be built by private contractors.
- Europe > Ukraine (0.26)
- South America (0.15)
- North America > Central America (0.15)
- (20 more...)
- Media (1.00)
- Government > Regional Government > Europe Government > United Kingdom Government (1.00)
- Government > Military (1.00)
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > Scotland > Scottish Borders (0.04)
- Europe > United Kingdom > Scotland > Dumfries and Galloway (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Overview (0.68)
LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data
Yang, Cehao, Lin, Xueyuan, Xu, Chengjin, Jiang, Xuhui, Ma, Shengjie, Liu, Aofan, Xiong, Hui, Guo, Jian
Despite the growing development of long-context large language models (LLMs), data-centric approaches relying on synthetic data have been hindered by issues related to faithfulness, which limit their effectiveness in enhancing model performance on tasks such as long-context reasoning and question answering (QA). These challenges are often exacerbated by misinformation caused by lack of verification, reasoning without attribution, and potential knowledge conflicts. We propose LongFaith, a novel pipeline for synthesizing faithful long-context reasoning instruction datasets. By integrating ground truth and citation-based reasoning prompts, we eliminate distractions and improve the accuracy of reasoning chains, thus mitigating the need for costly verification processes. We open-source two synthesized datasets, LongFaith-SFT and LongFaith-PO, which systematically address multiple dimensions of faithfulness, including verified reasoning, attribution, and contextual grounding. Extensive experiments on multi-hop reasoning datasets and LongBench demonstrate that models fine-tuned on these datasets significantly improve performance. Our ablation studies highlight the scalability and adaptability of the LongFaith pipeline, showcasing its broad applicability in developing long-context LLMs.
- North America > Panama (0.18)
- Europe > Hungary (0.15)
- Europe > Poland (0.14)
- (12 more...)
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF
Zhang, Hanning, Song, Juntong, Zhu, Juno, Wu, Yuanhao, Zhang, Tong, Niu, Cheng
Retrieval-augmented generation (RAG) enhances Large Language Models (LLMs) with relevant and up-to-date knowledge, improving their ability to answer knowledge-intensive questions. It has been shown to enhance both generation quality and trustworthiness. While numerous works have focused on improving retrieval, generation, and evaluation, the role of reward models in reinforcement learning for optimizing RAG remains underexplored. In this paper, we introduce \textbf{RAG-Reward}, a framework designed to develop reward models to enable \textit{hallucination-free, comprehensive, reliable, and efficient RAG}. We define four key metrics to assess generation quality and develop an automated benchmarking pipeline to evaluate the outputs of multiple LLMs across a variety of RAG scenarios. Using \textbf{RAG-Reward}, we train reward models and apply {reinforcement learning with human feedback (RLHF)} to improve LLMs' effectiveness in RAG. Experimental results demonstrate that our reward model achieves state-of-the-art performance in automatic benchmarking and aligns closely with human evaluations. Furthermore, the improved generation quality of the trained policy model highlights the feasibility and efficiency of using RLHF to enhance RAG outputs.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (7 more...)
Man guilty of army veteran hammer attack murder
Man guilty of army veteran hammer attack murder Cumbria PoliceJack Crawley attempted to burn Paul Taylor's body, before burying him in woodland A man who attacked an army veteran he had met for sex and bludgeoned him with a hammer has been found guilty of murder. Paul Taylor, 57, from Annan, Dumfriesshire, went missing last October, with his remains found in a shallow grave in woodland near Carlisle, Cumbria, in May. Jack Crawley, 20, of Carlisle, was found guilty of attacking him and trying to burn his body following a trial at the city's crown court. He will be sentenced on Wednesday. Crawley was also found guilty of the attempted murder of a man in York, who he met on the gay dating app Grindr and also attacked with a hammer, while he was on bail for killing Mr Taylor.
- North America > United States (0.53)
- Europe > United Kingdom > England > Cumbria (0.51)
- Europe > United Kingdom > Scotland > Dumfries and Galloway (0.25)
- (14 more...)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Government > Military > Army (0.83)
- Government > Regional Government > North America Government > United States Government (0.34)
LLM Evaluators Recognize and Favor Their Own Generations
Panickssery, Arjun, Bowman, Samuel R., Feng, Shi
Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement. But new biases are introduced due to the same LLM acting as both the evaluator and the evaluatee. One such bias is self-preference, where an LLM evaluator scores its own outputs higher than others' while human annotators consider them of equal quality. But do LLMs actually recognize their own outputs when they give those texts higher scores, or is it just a coincidence? In this paper, we investigate if self-recognition capability contributes to self-preference. We discover that, out of the box, LLMs such as GPT-4 and Llama 2 have non-trivial accuracy at distinguishing themselves from other LLMs and humans. By fine-tuning LLMs, we discover a linear correlation between self-recognition capability and the strength of self-preference bias; using controlled experiments, we show that the causal explanation resists straightforward confounders. We discuss how self-recognition can interfere with unbiased evaluations and AI safety more generally.
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > Scotland > Scottish Borders (0.04)
- (3 more...)
- Research Report > Experimental Study (0.54)
- Research Report > New Finding (0.46)
On the Benefits of Fine-Grained Loss Truncation: A Case Study on Factuality in Summarization
Flores, Lorenzo Jaime Yu, Cohan, Arman
Text summarization and simplification are among the most widely used applications of AI. However, models developed for such tasks are often prone to hallucination, which can result from training on unaligned data. One efficient approach to address this issue is Loss Truncation (LT) (Kang and Hashimoto, 2020), an approach to modify the standard log loss to adaptively remove noisy examples during training. However, we find that LT alone yields a considerable number of hallucinated entities on various datasets. We study the behavior of the underlying losses between factual and non-factual examples, to understand and refine the performance of LT. We demonstrate that LT's performance is limited when the underlying assumption that noisy targets have higher NLL loss is not satisfied, and find that word-level NLL among entities provides better signal for distinguishing factuality. We then leverage this to propose a fine-grained NLL loss and fine-grained data cleaning strategies, and observe improvements in hallucination reduction across some datasets. Our work is available at https://https://github.com/yale-nlp/fine-grained-lt.
- Europe > United Kingdom > Scotland > Scottish Borders (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
- (11 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
How to Discern Important Urgent News?
Vasilyev, Oleg, Bohannon, John
We found that a simple property of clusters in a clustered dataset of news correlate strongly with importance and urgency of news (IUN) as assessed by LLM. We verified our finding across different news datasets, dataset sizes, clustering algorithms and embeddings. The found correlation should allow using clustering (as an alternative to LLM) for identifying the most important urgent news, or for filtering out unimportant articles.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China > Hong Kong (0.04)
- Africa > Namibia (0.04)
- (9 more...)
- Leisure & Entertainment > Sports (1.00)
- Government (0.68)
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks
Jiang, Shuli, Kadhe, Swanand Ravindra, Zhou, Yi, Cai, Ling, Baracaldo, Nathalie
Growing applications of large language models (LLMs) trained by a third party raise serious concerns on the security vulnerability of LLMs. It has been demonstrated that malicious actors can covertly exploit these vulnerabilities in LLMs through poisoning attacks aimed at generating undesirable outputs. While poisoning attacks have received significant attention in the image domain (e.g., object detection), and classification tasks, their implications for generative models, particularly in the realm of natural language generation (NLG) tasks, remain poorly understood. To bridge this gap, we perform a comprehensive exploration of various poisoning techniques to assess their effectiveness across a range of generative tasks. Furthermore, we introduce a range of metrics designed to quantify the success and stealthiness of poisoning attacks specifically tailored to NLG tasks. Through extensive experiments on multiple NLG tasks, LLMs and datasets, we show that it is possible to successfully poison an LLM during the fine-tuning stage using as little as 1% of the total tuning data samples. Our paper presents the first systematic approach to comprehend poisoning attacks targeting NLG tasks considering a wide range of triggers and attack settings. We hope our findings will assist the AI security community in devising appropriate defenses against such threats.
- Europe > United Kingdom > Scotland > Scottish Borders (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Europe > United Kingdom > Scotland > Dumfries and Galloway (0.04)
- (7 more...)
- Health & Medicine (0.92)
- Information Technology > Security & Privacy (0.88)